g13bef
g13bef
© Numerical Algorithms Group, 2002.
Purpose
G13BEF Multivariate time series, estimation of multi-input model
Synopsis
[para,xxy,itc,sd,cm,s,d,ndf,res,sttf,nsttf,zsp,ifail] = g13bef(mr,mt,para,...
xxy<,kzsp,zsp,kfc,kpriv,kef,kzef,nit,ifail>)
Description
3.1. The Multi-input Model
The output series y , for t=1,2,...,n, is assumed to be the sum
t
of (unobserved) components z which are due respectively to the
i,t
inputs x ,t, for i=1,2,...,m.
i
Thus y =z +...+z +n where n is the error, or output noise
t 1,t m,t t t
component.
A typical component z may be either:
t
(a) A simple regression component, z =(omega)x (here x is
t t t
called a simple input) or
(b) A transfer function model component which allows for the
effect of lagged values of the variable, related to x by
t
z =(delta) z +(delta) z +...+(delta) z +(omega) x
t 1 t-1 2 t-2 p t-p 0 t-b
-(omega) x -...-(omega) x .
1 t-b-1 q t-b-q
The noise n is assumed to follow a (possibly seasonal) ARIMA
t
model, i.e., may be represented in terms of an uncorrelated
series, a , by the hierarchy of equations:
t
d D
(c) (nabla) (nabla) n =c+w
s t t
(d) w = (Phi) w +(Phi) w +...
t 1 t-s 2 t-2*s
+(Phi) w +e -(Theta) e -(Theta) e -...
P t-P*s t 1 t-s 2 t-2*s
-(Theta) e
Q t-Q*s
(e) e = (phi) e +(phi) e +...
t 1 t-1 2 t-2
+(phi) e +a -(theta) a -(theta) a -...
p t-p t 1 t-1 2 t-2
-(theta) a .
q t-q
Note: the orders p,q appearing in each of the transfer function
models and the ARIMA model are not necessarily the same;
d D
(nabla) (nabla) n is the result of applying non-seasonal
s t
differencing of order d and seasonal differencing of seasonality
s and order D to the series n , the differenced series is then of
t
length N=n-d-s*D; the constant term parameter c may optionally be
held fixed at its initial value (usually, but not necessarily
zero) rather than being estimated.
For the purpose of defining an estimation criterion it is assumed
that the series a is a sequence of independent Normal variates
t
2
having mean 0 and variance (sigma) . An allowance has to be made
a
for the effects of unobserved data prior to the observation
period. For the noise component an allowance is always made using
a form of backforecasting.
For each transfer function input, the user has to decide what
values are to be assumed for the pre-period terms z ,z ,...,z
0 -1 1-p
and x ,x ,...,x which are in theory necessary to re-create
0 -1 1-b-q
the component series z ,z ,...,z , during the estimation
1 2 n
procedure.
The first choice is to assume that all these values are zero. In
this case in order to avoid undesirable transient distortion of
the early values z ,z ,..., the user is advised first to correct
1 2
the input series x by subtracting from all the terms a suitable
t
constant to make the early values x ,x ,..., close to zero. The
1 2
_
series mean x is one possibility, but for a series with strong
trend, the constant might be simply x .
1
The second choice is to treat the unknown pre-period terms as
nuisance parameters and estimate them along with the other
parameters. This choice should be used with caution. For example,
if p=1 and b=q=0, it is equivalent to fitting to the data a
t
decaying geometric curve of the form A(delta) , for t=1,2,3,...,
along with the other inputs, this being the form of the
transient. If the output y contains a strong trend of this form,
t
which is not otherwise represented in the model, it will have a
tendency to influence the estimate of (delta) away from the value
appropriate to the transfer function model.
In most applications the first choice should be adequate, with
the option possibly being used as a refinement at the end of the
modelling process. The number of nuisance parameters is then
max(p,b+q), with a corresponding loss of degrees of freedom in
the residuals. If the user aligns the input x with the output by
t
using in its place the shifted series x , then setting b=0 in
t-b
the transfer function model, there is some improvement in
efficiency. On some occasions when the model contains two or more
inputs, each with estimation of pre-period nuisance parameters,
these parameters may be co-linear and lead to failure of the
routine. The option must then be 'switched off' for one or more
inputs.
3.2. The Estimation Criterion
This is a measure of how well a proposed set of parameters in the
transfer function and noise ARIMA models, matches the data. The
estimation routine searches for parameter values which minimize
this criterion. For a proposed set of parameter values it is
derived by calculating:
(i) the components z ,z ,...,z as the responses to the
1,t 2,t m,t
input series x ,x ...,x using the equations (a) or
1,t 2,t m,t
(b) above,
(ii) the discrepancy between the output and the sum of these
components, as the noise
n =y -(z +z +...+z ),
t t 1,t 2,t m,t
(iii) the residual series a from n by reversing the recursive
t t
equations (c), (d) and (e) above.
This last step again requires treatment of the effect of unknown
pre-period values of n and other terms in the equations
t
regenerating a and leads to a criterion which is a sum of
t
squares function S, of the residuals a . It may be shown that
t
the finite algorithm presented there is equivalent to taking the
infinite set of past values n ,n ,n ,..., as (linear) nuisance
0 -1 -2
parameters. There is no loss of degrees of freedom however,
because the sum of squares function S may be expressed as
including the corresponding set of past residuals --
n
-- 2
S= > a .
-- t
-infty
The function D=S is the first of the three possible criteria, and
is quite adequate for moderate to long series with no seasonal
parameters. The second is the exact likelihood criterion which
considers the past set n ,n ,n not as simple nuisance
0 -1 -2
parameters, but as unobserved random variables with known
distribution. Calculation of the likelihood of the observed set
n ,n ,...,n requires theoretical integration over the range of
1 2 n
the past set. Fortunately this yields a criterion of the form
D=M*S (whose minimization is equivalent to maximizing the exact
likelihood of the data), where S is exactly as before, and the
multiplier M is a function calculated from the ARIMA model
parameters. The value of M is always >= 1, and M tends to 1 for
any fixed parameter set as the sample size n tends to infty.
There is a moderate computational overhead in using this option,
but its use avoids appreciable bias in the ARIMA model parameters
and yields a better conditioned estimation problem.
The third criterion of marginal likelihood treats the
coefficients of the simple inputs in a manner analogous to that
given to the past set n ,n ,n ,.... These coefficients,
0 -1 -2
together with the constant term c used to represent the mean of
w , are in effect treated as random variables with highly
t
dispersed distributions. This leads to the criterion D=M*S again,
but with a different value of M which now depends on the simple
input series values x . In the presence of a moderate to large
t
number of simple inputs, the marginal likelihood criterion can
counteract bias in the ARIMA model parameters which is caused by
estimation of the simple inputs. This is particularly important
in relatively short series.
G13BEF can be used with input series present, to estimate a
univariate ARIMA model for the ouput alone. The marginal
likelihood criterion is then distinct from exact likelihood only
if a constant term is being estimated in the model, because this
is treated as an implicit simple input.
3.3. The Estimation Procedure
This is the minimization of the estimation criterion or objective
function D (for deviance). The routine uses an extension of the
algorithm of Marquardt. The step size in the minimization is
inversely related to a parameter (alpha), which is increased or
decreased by a factor (beta) at successive iterations, depending
on the progress of the minimization. Convergence is deemed to
have occurred if the fractional reduction of D in successive
iterations is less than a value (gamma), while (alpha)<1.
Certain model parameters (in fact all excluding the (omega)'s)
are subject to stability constraints which are checked throughout
to within a specified tolerance multiple (delta) of machine
accuracy. Using the least-squares criterion, the minimization may
halt prematurely when some parameters 'stick' at a constraint
boundary. This can happen particularly with short seasonal series
(with a small number of whole seasons). It will not happen using
the exact likelihood criterion, although convergence to a point
on the boundary may sometimes be rather slow, because the
criterion function may be very flat in such a region. There is
also a smaller risk of a premature halt at a constraint boundary
when marginal likelihood is used.
A positive, or zero number of iterations can be specified. In
either case, the value D of the objective function at iteration
zero is presented at the initial parameter values, except for
estimation of any pre-period terms for the input series,
backforecasts for the noise series, and the coefficients of any
simple inputs, and the constant term (unless this is held fixed).
At any later iteration, the value of D is supplied after re-
estimation of the backforecasts to their optimal values,
corresponding to the model parameters presented at that
iteration. This is not true for any pre-period terms for the
input series which, although they are updated from the previous
iteration, may not be precisely optimal for the parameter values
presented, unless convergence of those parameters has occurred.
However, in the case of marginal likelihood being specified, the
coefficients of the simple inputs and the constant term are also
re-estimated together with the backforecasts at each iteration,
to values which are optimal for the other parameter values
presented.
3.4. Further Results
S
The residual variance is taken as erv= -- where df=N-(total
df
number of parameters estimated), is the residual degrees of
freedom. The pre-period nuisance parameters for the input series
are included in the reduction of df, as is the constant if it is
estimated.
The covariance matrix of the vector of model parameter estimates
is given by
-1
erv*H
where H is the linearised least-squares matrix taken from the
final iteration of the algorithm of Marquardt. From this
expression are derived the vector of standard deviations, and the
correlation matrix of parameter estimates. These are
approximations which are only valid asymptotically, and must be
treated with great caution when the parameter estimates are close
to their constraint boundaries.
The residual series a is available upon completion of the
t
iterations over the range t=1+d+s*D,...,n corresponding to the
differenced noise series w .
t
Because of the algorithm used for backforecasting, these are only
true residuals for t>=1+q+s*Q-p-s*P-d-s*D, provided this is
positive. Estimation of pre-period terms for the inputs will also
tend to reduce the magnitude of the early residuals, sometimes
severely.
The model component series z ,...,z and n may optionally be
1,t m,t t
returned in place of the supplied series values, in order to
assess the effects of the various inputs on the output.
Parameters
g13bef
Required Input Arguments:
mr (7) integer
mt (4,:) integer
para (:) real
xxy (:,:) real
Optional Input Arguments: <Default>
kzsp integer 0
zsp (4) real zeros(4,1)
kfc integer 1
kpriv integer 0
kef integer 3
kzef integer 0
nit integer 50
ifail integer -1
Output Arguments:
para (:) real
xxy (:,:) real
itc integer
sd (:) real
cm (:,:) real
s real
d real
ndf integer
res (:) real
sttf (:) real
nsttf integer
zsp (4) real
ifail integer